Clustered linear regression

نویسندگان

  • Bertan Ari
  • H. Altay Güvenir
چکیده

Clustered linear regression (CLR) is a new machine learning algorithm that improves the accuracy of classical linear regression by partitioning training space into subspaces. CLR makes some assumptions about the domain and the data set. Firstly, target value is assumed to be a function of feature values. Second assumption is that there are some linear approximations for this function in each subspace. Finally, there are enough training instances to determine subspaces and their linear approximations successfully. Tests indicate that if these approximations hold, CLR outperforms all other well-known machine-learning algorithms. Partitioning may continue until linear approximation ®ts all the instances in the training set Ð that generally occurs when the number of instances in the subspace is less than or equal to the number of features plus one. In other case, each new subspace will have a better ®tting linear approximation. However, this will cause over ®tting and gives less accurate results for the test instances. The stopping situation can be determined as no signi®cant decrease or an increase in relative error. CLR uses a small portion of the training instances to determine the number of subspaces. The necessity of high number of training instances makes this algorithm suitable for data mining applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design-adaptive Minimax Local Linear Regression for Longitudinal/clustered Data

This paper studies a weighted local linear regression smoother for longitudinal/clustered data, which takes a form similar to the classical weighted least squares estimate. As a hybrid of the methods of Chen and Jin (2005) and Wang (2003), the proposed local linear smoother maintains the advantages of both methods in computational and theoretical simplicity, variance minimization and bias reduc...

متن کامل

A multiple imputation approach to linear regression with clustered censored data.

We extend Wei and Tanner's (1991) multiple imputation approach in semi-parametric linear regression for univariate censored data to clustered censored data. The main idea is to iterate the following two steps: 1) using the data augmentation to impute for censored failure times; 2) fitting a linear model with imputed complete data, which takes into consideration of clustering among failure times...

متن کامل

Semiparametric Poisson Regression Model for Clustered Data

A semiparametric Poisson regression is proposed in modeling spatially clustered count data. The heterogeneous covariate effect across the clusters is formulated in the context of nonparametric regression while the random clustering effect is based on a parametric specification. We propose two estimation procedures: (1) the parametric and nonparametric parts are estimated simultaneously via pena...

متن کامل

Outlier Detection with Two-Stage Area-Descent Method for Linear Regression

— Outlier detection is an important task in many applications; it can lead to the discovery of unexpected, useful or interesting objects in data analysis. Many outlier detection methods are available. However, they are limited by assumptions in distribution or rely on many patterns to detect one outlier. Often, a distribution is not known, or experimental results may not provide enough informat...

متن کامل

Sensitivity analyses for clustered data: an illustration from a large-scale clustered randomized controlled trial in education.

In this paper, we demonstrate the importance of conducting well-thought-out sensitivity analyses for handling clustered data (data in which individuals are grouped into higher order units, such as students in schools) that arise from cluster randomized controlled trials (RCTs). This is particularly relevant given the rise in rigorous impact evaluations that use cluster randomized designs across...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Knowl.-Based Syst.

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2002